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Abstract — Message passing over a factor graph can be con- 
sidered as a generalization of the many well-known algorithms 
for an efficient marginalization of multivariate functions. The 
specific instance of the algorithm is obtained by selection of 
an appropriate commutative semiring for the range of the 
function to be marginalized. Some of the examples are the Viterbi 
algorithm, obtained on a max-product semiring and the forward- 
backward algorithm obtained on a sum-product semiring. 

In this paper, the Entropy Message Passing algorithm (EMP) 
is developed. It operates over an entropy semiring, previously 
introduced in the automata theory. It is shown how the EMP 
can be used for efficient computation of the model entropy and 
of the complex expressions which appear in the Expectation 
Maximization and the gradient descent algorithms. Likewise, 
the EMP can be seen as a generalization of the Sum-Product 
Algorithm and this connection is derived. 

Index Terms — factor graphs, graphical models, message pass- 
ing, Sum-Product Algorithm, commutative semiring, entropy, 
Expectation Maximization, gradient methods. 

I. Introduction 

The efficient marginalization of a multivariate function is 
important in different areas such as signal processing, artificial 
intelligence, and digital communications. In certain cases, 
when it is possible to represent the function with a cycle-free 
factor graph, the exact marginal value can be obtained by the 
message passing algorithm ITJ-J41, which can be described 
as an algorithm sending the messages over the edges and 
processing the messages in the nodes of the factor graph. The 
general rules for message processing depend on the type of the 
semiring that corresponds to the range of the function to be 
marginalized. A wide variety of well-known algorithms can 
be derived as an instance of the message passing algorithm 
by selection of an appropriate semiring and the structure of 
the factor graph. For example, the choice of the max-product 
semiring yields the Viterbi algorithm, while the sum-product 
semiring leads to the forward-backward algorithm [4 |. 

In this paper, we develop the message passing algorithm 
that operates over an entropy semiring. The entropy semiring 
is introduced in [5| with the purpose of efficient computation 
of the relative entropy between probabilistic automata. The 
efficient computation is performed by application of well- 
known dynamic programming algorithm, the Floyd-Warshall 
algorithm, running on the entropy semiring. In this paper, we 
translate the ideas from [5] into the language of factor graphs 
and message passing algorithms. 
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The Entropy Message Passing (EMP), as we have named the 
algorithm, is used for calculating the entropy of tree structured 
probabilistic models, generalizing the previously developed 
algorithm |6| that works only for chains . Furthermore, the 
EMP unifies the work on the Expectation Maximization and 
the steepest descent message passing algorithms proposed in 
ll7l- lfT3l : it can also be considered as a generalization of the 
Sum-Product Algorithm ID-El. 

The paper is organized as follows: In section II we explain 
the process of passing the messages over a factor graph while 
considering the Sum-Product Algorithm. The entropy semiring 
is defined in section III, where we prove the factorization 
lemma that allows us to develop the EMP in section IV. In 
section V we consider the previously mentioned applications 
of the EMP, whereas the possible extensions and further 
applications are given in section VI. Finally, in the appendix 
we show how the entropy of probabilistic automata can be 
computed using the Floyd-Warshall algorithm that operates 
on the entropy semiring. 

II. Factor Graphs and the Sum-Product 
Algorithm 

Let / be a real multivariate function which depends on the 
set of variables x = {x}^ =1 and satisfies the factorization 
property 

/(X) = J] fm(Xm) (1) 

for a set of indexes A4. In the expression (Q]) each factor 
/ m (x m ) depends on x m C x and subsets x m cover x. The 
factorization (fl]i can be graphically represented by a factor 
graph ID-ED. 




Fig. 1. The factor graph that correspons to the factorization 

fA{xi)fB{x 2 )fc{xx,X2, X3)fD(xi, X 4 )f E (x2,X 5 ). 

A factor graph consists of the variable nodes (drawn as cir- 
cles), the factor nodes (drawn as squares) and the connections 
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between the nodes, where the variable node n and the factor 
node m are connected if and only if the factor f m depends 
on the variable x n . Since the connections are realized only 
between the factor nodes and the variable nodes, the factor 
graph has a bipartite property. An example of the factor graph 
that depicts factorization 

f(xi,X2,x 3 ,X4,x 5 ) = /A(xi)/ fl (x 2 )/c(a;i , x 2 , x 3 ) 

fD(xi,Xi)f B (x2,X 5 ) (2) 

is given in Fig. 1. 

The factor graphs allow the solution for the two important 
inference problems [14|: 

1) The marginalization problem: 

Z n {Xn) ^2 } [ /m(x m ), (3) 

where 2~2x\x denotes the summing over of all the variables 
from x except for x n and 

2) The normalization problem: 

z= yi n /™(*"0- w 

x m£M 

The solution of the second problem can easily be obtained 
from the solution of the first problem by summing: 

Z = ^Z n (x n ); (5) 

therefore, in the following paragraphs, we are concerned with 
the solution of the first problem. 

A. The Sum-Product Algorithm 

The Sum-Product Algorithm (SPA) is an efficient way for 
the marginalization of a multivariate function [1 1-|3|. It oper- 
ates as the message passing algorithm over the factor graph of 
a function to be marginalized. The computed marginal value is 
exact for a cycle-free factor graph, but the algorithm can also 
be applied on the graph with cycles giving the approximate 
solution |15|-[17|. In this paper we consider the cycle-free 
(tree structured) factor graphs. 

There are two types of messages being sent over the factor 
graph: 

1) the messages q n ^m(x n ) from the variable to factor 
nodes and 

2) the messages r m ^ n (x n ) from the factor to variable 
nodes, 

where the variable and factor nodes participating in the 
message passing process are denoted with n and rn. Note 
that both types of messages are the function of the variable 
corresponding to the node involved in the process of message 
passing. 

The messages are initialized to q n ^ m (x n ) = 1 and 
r m ^ n (x n ) — fm(x n ), for the all variable nodes n and the 
factor nodes to in leaves of the factor graph, for all possible 
values x n . After that the messages are passed toward the 
root that corresponds to the variable for which the marginal 
value is computed. The message from a node to its parent 



is computed after the messages from all descendants are 
received, according to the following rules: 

Qn^m{x n ) = J"J r m'— >n(x n ) (6) 

m'£JV r (n)\m 

and 

fm— >-ro ) / ^ fmip^m) J^J Qn'— ^m(^n')' (^) 

Here, N(n) \ to denotes all the nodes that are neighbors 
of the node n except for the node to, and ^ x ^ x denotes a 
sum over all the variables x m that are arguments of f m except 
x n . The process is terminated at the root, where the marginal 
function is computed according to: 

Z n {x n ) = J^J fm— yn (x n ). (8) 

meAf(ra) 

When considering SPA, we supposed that the function to 
be marginalized has as a codomain the set of real numbers 
obtained with the standard operations + and x . Nevertheless, 
the algorithm still works when the codomain is an arbitrary 
semiring (see the next section for the definition). The general- 
ized form of the algorithm can be obtained straightforwardly 
by replacing the operations + and x from the set of real 
numbers with the operations ffi and ® from the semiring 

(D,f£l,(J3l- 

III. The Entropy Semiring 

In this section we introduce the algebraic notions that will 
be useful for development of the EMP algorithm. First, we 
give the definition of a commutative semiring |19|. 

Definition 1: The system ( /C, ©, (g>, 0, 1 ) is called the 
commutative semiring if: 

1) The operations © and eg) are associative and commuta- 
tive; 

2) The equalities k © = k and k <S> 1 = k hold for all 

k G/C; 

3) The operation © distributes over ©, i.e., for all a, b, c G 
JC the following equalities hold: 

(a © 6) © c = (a © c) © (6© c), (9) 
c<S) (a®b) = (c<Z)a) ® (c(g)b). (10) 

Some common commutative semirings are the sum- 
product semiring (TZ+, +, X, 0, 1), the Boolean semir- 
ing ({0, 1},A, V, 0, 1) and the max-product semiring 
(1Z+ , max, x , 0, 1), where 1Z denotes the set of real numbers. 
Other semirings used in the message passing algorithms can 
be found in |4| . In this paper we consider the entropy semiring 
|5| (also called the expectation semiring |20|). 

Definition 2: The entropy semiring is a tuple 

( K 2 , ffi, ©, (0, 0),(1, 0) ), 
where the operations ffi and © are defined with: 

(xuVi) © (£2,2/2) = {x\ +x 2 ,Vi +2/2); (11) 
{xi,yi) © (£2,2/2) = (xi£2,xi2/2 + £22/1), (12) 
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for all (x%, yx), (x 2 , 1/2) S TZ . Hence, if a function has the form (TTol l than it can be factorized 

Here, we prove the important lemma about the factorization as (TBI , where the factors are given with (TT3T > . 



in an entropy semiring. The lemma will be useful for the With a fast computation of: 
he 

Le 

(a m , b m ) 6 1Z 2 , m e M the following equality holds: 



derivation of the EMP algorithm in the next section. .-p. 

Lemma 1: Let M be a finite set of indices. Then, for all ( Z > H ) = Kj) w ( x )> ( 17 ) 



we solve two problems: 

(\ 1) The computation of the expression: 

n «m. bm n a A- w z =y, n m*™^ m 

m£M mEM j£M\{m} ) x meM 

Proof: We prove the lemma by the induction over the which is the normalization problem considered in section II 

cardinality of M.. Without the loss of generality, suppose that and 

the sets M have the form {1, 2, ... , fc} where fc is from 2) The computation of the expression: 

the set of the natural numbers. If M. has two elements, the x 

equality ( 1131 ) reduces to the definition for multiplying in an H = y j /m( x m 

entropy semiring: x m &M keM 

, . s , . ■. 1 , j which is the general form of the different problems described 

(at, 61)181(02, 6 2 ) = (0102, ai& 2 + a 2 0i). . . °. „, , . t . , 

in the next section. These problems are the key motivation for 

Now, let the equality hold for some k - element set A4k = our work. 
{1, 2, ... , fc}: If the factor graph corresponding to w(x) has a tree struc- 

ture, the computation (TTTb can be performed by the message 
passing over the entropy semiring. The tree structure condition 
is satisfied when the function: 



(a m , b rn ) = a m , ^2 b m [J a 3 

meMk \m£M k m£M k jEM k \{m} / 

Using this, and using the equality M k +i = M k U {fc + 1}, /( x ) = IJ fmi^m) (20) 

it is easy to obtain ( U~3l for fc + 1 - element set Mk+i = meM 

{1, 2, ... , fc + 1}: has no cycles, since the factors w m (x m ) (equation (fT5T l) 

— . depend on the same set of variables as / m (x m ). To perform 

(a m ,b m )= (o m , b m ) ® (Om+i, WO = the summation (flTl we follow the procedure from section II 

me - Mfc+1 meMk - we calculate the marginal: 

n ^b k+l n a m + 2 6- n «*) ^ (Xn)= S" (x) (21) 



v meM H i meM k meM k j€Mk+l\{m} 

] [ o m , 6 m ] j 1 



for a variable x n , and subsequently we obtain the total sum 
by: 

0i»(x)=0If„(4 (22) 



which proves the lemma. In ^ following para g rap hs, we formalize the discussion by 

' using the Entropy Message Passing (EMP) algorithm: 

1) Initialization: Set the messages from all variable and 

IV. The Entropy Message Passing Algorithm factor nodes in leaves tQ . 

Let w(x) be a multivariate function whose codomain is an 

entropy semiring fC and let the factorization Qn-ym(x n ) = (1, 0), (23) 

W (x) = (g) Wm (x m ) (14) r m ^ n (x n ) = (f m (x n )J m (x n )g m (x n )). (24) 

, ,, . . . ,. .vi 1 j- / \ 2) Induction: After receiving the messages from all descen- 

hold for a set of indices M, where each factor w m (x m ) ' , , ° „ . , , , 

, , , . „ , , dants, compute the messages to the parents tor all variable and 

depends on x m C x and subsets x m cover x. Further, let , • , 

. . , j. factor nodes in the tree: 
the factors have a form 

^m( x m) (,/m (x?n ) ; ,/m ( x m)5m ( x wi)) , (15) 

where / m (x m ) and y m (x m ) are the real functions which 

depend on the same set of variables x m . Using the equality _ /_ \ _ /T\ f f f v \ f ( v \„ f v \\ 

rrrn • • • lra^rn\Xnj — \AJ \J my^m ) , J m\*-m )ym\. Xj m ) ) 

(113b it is easy to obtain: x \ x 

(g) q„'->. m (x„'). (26) 

n' GAf(m)\n 



Qn— ^m(^n) 0Q ^m'— >n(-^n); (25) 

m'GjV(n)\m 



J ( x ) = I II /m(Xro), J| / m (x m ) Ofe(Xfc) j . (16) 
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3) Termination: At the root, compute the marginal value 
and the total sum: 

(Z n (x n ), H n (x n )) = (X) 

(x n ), (27) 

m£jV(n) 



(Z,H)=Q)(Z n (x n ), H n (x n )) 



(28) 



The EMP has the same asymptotic computational complex- 
ity as the SPA, since addition and multiplication in an entropy 
semiring are realized via addition and multiplication of the 
real numbers. The precise complexity estimates of the message 
passing algorithms can be found in J4). 

V. Applications 

In this section we show how the EMP relates and applies to 
the Sum-Product Algorithm, the entropy computation, and the 
optimization techniques such as the Expectation Maximization 
and the gradient ascent algorithm. 

A. The EMP as a generalization of the SPA 

Let (a, b) be an element of an entropy semiring /C. We 
define z-part of (a, b) with: 



(o,6) w =o. 



(29) 



Further, let M. be a set of indices and c m 6 JC for m 6 M. 
The definition of the entropy semiring addition and Lemma 1 
then imply the following equalities: 



C rn 



n 



»(*) 



(30) 



(31) 



Using these equalities we can derive the z-parts of the EMP 
equations (T23]i-d28]> in the following manner. 

1 ) Initialization: The z-parts of the messages from variable 
and factor nodes in leaves are initialized to: 



1nX-m{ x n) — L 



(32) 



(33) 



2) Induction: After receiving the z-parts of the messages 
from all the descendants, compute the z-parts of the messages 
to the parents for all variable and factor nodes in the tree: 



<l 



>») = n 



(34) 



m' {zAf(n)\m 



r£l n (x n )= ]T / m (x m ) [] Qn'UmM (35) 



3) Termination: At the root, a marginal value is computed 
using the z-parts of the incoming messages: 

Z n (Xn)= II ri ™Kn(Xn), (36) 

and the total sum is: 



Z — y Z n (x n ). 



(37) 



These equations are actually the equations of the Sum 
Product Algorithm considered in section II. Consequently, the 
SPA can be seen as a z-part of the EMP. 

B. Entropy computation of a partially observed probabilistic 
model 

In this section we show how the EMP can be used for 
an efficient computation of the state sequence entropy of the 
partially observed probabilistic models. The algorithm for such 
computation has previously been proposed in J6), but only 
for the chain structured models. Applying the EMP, we can 
generalize this algorithm to the arbitrary probabilistic model 
the factor graph of which has no cycles. 

Let a partially observed model be given with the probability 
distribution P(x, y), where x = {x\, x n } denotes a 

hidden variable sequence of length n and y = {yi, . . . , y m } 
denotes an observation sequence of length m. The entropy of 
the model P(x, y) for the given sequence of the observation 
is given with: 



h(x\y = y) = -J2 p ( x 'y) lo S2 ^Wy)- 



(38) 



By use of the Bayesian theorem and the additivity of a 
logarithm, this expression can be transformed to: 

H(X\Y = y) = 

- S ' P |5'gy) <X ' y> + ' 0fe ? P( ''- y> - 1391 

Note that the probability distribution P(x, y) can be consid- 
ered as a function depending only on the vector variable x, 
since y is observed and can be treated as a constant. 
Let P(x, y) factorize as 



f (x,y) = ] { /m(x m ), 



(40) 



771 £M 



that corresponds to a cycle-free factor graph. If g m (x m ) 
log 2 / m (x m ), the expression d39l can be written as: 



where 



and 



H(X\Y = y) = -^+log 2 Z, 



Z = ^2 II f™( x r, 
x m£M 



(41) 
(42) 

(43) 



The previous expressions have the form (fT8l and (fl9l > and 
can be computed with the EMP, which solves the problem of 
efficient computation of the model entropy. 
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C. Iterative optimization techniques 
Suppose we wish to find 



e, 



argmaxp(G) 



(44) 



with a parameter taking values from 7Z or lZ k . We assume 
that p(&) is the marginal of a real-valued nonnegative function 

p(x, 6): 



p(0) 



w(x,0). 



(45) 



In this section we consider two popular procedures for 
solving the problem (l44t - the Expectation Maximization (.EM) 
and the gradient ascent algorithm. Both algorithms seek the 
solution iteratively with the parameter being estimated in 
each iteration. In the following paragraphs we show how the 
EMP can be used for the computations which appear here. 
The key requirement for the function p(x, 0) is that the 
factorization 



p(x, 0) = JJ_ Pm(x m ,0) 



(46) 



meM 



be tree structured if is fixed and that p(x, 0) is considered 
as the function of x only, although the whole factor graph of 
p(k, 0), as well as the factor graph of p(Q), may have cycles. 
Similar constraint can be found in the previous papers which 
consider the EM algorithm from the message passing point of 
a view |7|-[10|, but with slightly stricter requirement that the 
factor graph of p(&) should be cycle free. 

We assume that the vector variable x takes values from 
discrete finite set. The message passing algorithms which deal 
with this optimization techniques in the case of continuous 
variables can be found in [10] and ifTTl . 

1) The Expectation Maximization Algorithm: The EM 
algorithm [3], [ 1 1 attempts to compute (144-b as follows: 

1) Choose an initial setting for the parameters old . 

2) £-step: Evaluate p(x, old ). 

3) M-step: Evaluate new given by 



new = argmaxQ(0,0 old ) 



(47) 



where 



where Ve denotes the gradient operator. After substituting 
d46| | in d48l i, the expression d49b can be transformed into the 
form we will use in the following text: 

II P™(x m ,0 old ) J2 V©logp fe (x fc ,0) = O. (50) 



x meM 



keM 



It can be shown that the EM algorithm always leads to a 
solution. Nevertheless, it becomes computationally demanding 



as the number of the steps required for its convergence and 
dimensionality of x grow. Yet, this problem can efficiently be 
solved with the EMP when gradients of the logarithms of the 
factors in (|46| | linearly depend on 0, i.e. 



V e logp fc (x fc ,0) = M fc (x fc ) • + v fc (x fc ) ■ A, 



(51) 



where A is a constant vector of the same dimensionality as 
(see [HI, IfTTl and (2l ] for the examples). Accordingly, the 
solution of < f50b has the form: 



where 



and 



(52) 



(53) 



x meM 



keM 



x meM keM 

The expressions for H a and can efficiently be com- 
puted with the EMP algorithm since both can be derived 
from (O by the settings / m (x m ) = p m (x m ,0 old ) and 
#fe(x fc ) = tifc(xfe) for H a and / m (x m ) = p m (x m ,0 old ) and 
5 fe (x fe ) = v k (x k ) for H b . 

2) The Gradient Ascent Algorithm: Previously described 
procedure for parameter estimation can be applied when the 
linear dependence ( BTT i holds, but when the dependency is 
nonlinear, the analytic solution for the M-step does not exist 
in general. Instead, we can apply the gradient ascent algorithm 
ifTTl - lTHl to solve the optimization problem (l44l . The gradient 
ascent seeks the maximum of real nonnegative differentiable 
function p(&) by an iterative process: 



i+1 = 4 + V e p(©) 



e, ■ 



(55) 



where Vep(0)|e ; denotes the gradient of p(0) at the point 
®i. Since p(Q) is given by the marginal d45l >. the gradient can 
be written as 



V p(0)|(. 



E 



V e p(x,0)| 



(56) 



Q(0, old ) - ^>(x, old ) logp(x, 0). (48) 

X 

4) While the convergence criterion is not satisfied, let 

@oid = Q new and return tQ step 2 where 

The M-step is usually performed by solving the equation 

V e Q(0, old ) - 0, (49) and 



If we apply Leibniz's rule to the factorization d461 l. the 
previous expression becomes: 



VeP(e)|e 4 =E H f™( 



x meM 



9k(*k), (57) 

keM 



fmi^~m) — Pmip^-mi 00 



, , V e pfc(x fc ,0) 
3fe(xfc) = —_ — ~ — \0=0i 



(58) 



(59) 



p k (x k ,Q) 

Again, we have the expression of the form (1 191 , so the gradient 
(|57] | can be evaluated with the EMP algorithm. 

The gradient ascent can also be used for the M-step of the 
EM algorithm as in lfT2l . lf22l and 11231 . In this case, 
should be maximized by iterative procedure: 







i+l 



©i + VeQC©,© 0111 ^ 



(60) 
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The computation that appears here can also be performed with 
the EMP, since the gradient of Q(6, 9 old ) reduces to (57]) for 



fm fe-m ) — Pm (p^-m , © ) 



and 



/ x VePfc(x fc ,e) 

fffc(Xfc) = / 1 0=04, 



(61) 



(62) 



which can easily be shown. 

VI. Conclusion 

In this paper, we have developed the Entropy Message Pass- 
ing algorithm that extends the class of problems solvable by 
means of the factor graphs. We showed how Entropy Message 
Passing can be used for efficient computation of a model 
entropy and efficiently perform the computations which appear 
in the Expectation Maximization algorithm and the gradient 
methods. These results can be used in generalization of other 
algorithms, such as the entropy based learning method [24 1 or 
the model parameter estimation algorithm [23], which is one 
direction of further work. Another direction arises from the 
question about the graph structure. Specifically, in this paper, 
we have considered tree structured models. Nevertheless, the 
number of papers about Loopy Belief Propagation lfl5l - lfT7l 
suggests that Entropy Message Passing should give satisfactory 
approximate results for a wide class of factor graphs with 
cycles. 

VII. Appendix 
Computation of the Entropy of a Probabilistic 
Automaton 

The idea about entropy computation using an entropy semir- 
ing has its origin in the automata theory. In this appendix we 
provide an overview of it (see J5) and J20] for the complete 
discussion). 

A weighted automaton A over a semiring (JC, ©, 0,0,1 ) 
are a 7-tuple (£, Q, I, F, E, A, p) where: 

■ E is the finite alphabet of the automation, 

> Q is a finite set of states, 

• ICQ the set of initial states, 

• F C Q the set of final states, 

• E C Q x T, U {e} x JC x Q a finite set of transitions, 

> A : / — y K. the initial weight function mapping and, 
« p : F — >• JC the final weight function mapping. 

The automata considered here do not contain empty e- 
transitions and have the weight function mappings equal to 
1. 

For a transition e £ E we denote its previous state with 
p[e] £ Q, the input label with l[e] £ S, the weight with 
w[e] £ JC, and the next state with n[e], A patli ir is a 
sequence of transitions ei • • • for which n[ej_i] = p[ej, 
for i = 2, . . . , k. The set of paths that starts in the set of 
initial states / and ends in the set of final states F of A is 
denoted by Ha(I,F). The labeling function I and the weight 
function w can be extended to the paths by defining the label 
of a path as the concatenation of the labels of its constituent 
transitions, and the weight of a path as the <E) product of 



the weights of its constituent transitions: l[ir] = l[ei] ■ ■ ■ l[ek], 
w[tt] — w[ei] ® • • ■ ® w[e k ]- 

We consider the automata defined over the closed semirings. 
The semiring JC is closed for the automata A if for all weights 
w £ JC the infinite sum 



W ' 



w 



w <g> • • ■ <g> w 



(63) 



n=0 



is well defined and if the associativity, the commutativity, and 
the distributivity apply to the countable sums. 

A weighted automaton is said to be complete unambiguous 
if any input label sequence a corresponds exactly to one 
path labeled with a. The norm of the complete unambiguous 
automata A is defined by: 



s(A) = M4 



(64) 



Tren A (i,F) 



If the automaton is defined over a closed semiring the 
computation can be performed using a generalization of the 
classical Floyd- Warshall algorithm as follows. 

1) Initialization: For all q, s £ Q, the auxiliary matrix A is 
initialized to the sum of weights of all possible transitions 



A[q,s}= Me]. 



(65) 



e:p[e]—q,n[e] — s 



2) Induction: The algorithm is iterated for all r £ Q. At 
each iteration the matrix A is updated according to 

A[q,s] = A[q,s] © (A{q,r] © A[r,r]* ® A[r,s\) , (66) 

for all q,s £ Q, where A[r, r]* is defined with d63l . 

3) Termination: After the last iteration the norm is computed 

as 



S(A) = A[z,/]. 



(67) 



In the following paragraphs we show how the entropy of 
probabilistic automata can efficiently be computed by use of 
the Floyd- Warshall algorithm over an entropy semiring. 

The complete unambiguous automaton defined over the 
sum-product semiring (TZ+, +, X, 0, 1) is said to be proba- 
bilistic if it is normalized i.e. S(A) = 1 and < w[e] < 1 for 
each transition e. Its entropy is given by 



H(A) — — w[tt] log«;[7r], 

nen A (i,F) 



(68) 



The procedure for the entropy computation follows. First, 
from the probabilistic automata A = (S, Q, I, F, E, A, p) 
we construct the automata over the entropy semiring JC, 
A = ^S, Q, /, F, E, A, p\ which differs from A only in the 

set of transitions E that is built up in the following way. Every 
edge e £ E has only one corresponding edge e £ E so that 
e and e have the same input label, the same previous and the 
same next state, while the weight of e is given by 



w[e] = (w[e\, w[e) log w[e\) £ JC. 



(69) 
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Consequently, every path it — e\ ■ ■ ■ in A corresponds to 
exactly one path 5r = e% ■ ■ ■ ejv in A with the weight 

JV 

w[n] =<g)w[e n }. (70) 

n=l 

By substituting d69l > in ( |70l i and using Lemma 1, we obtain: 

w[n] = (w[ir], w[tt] log w[ir]) . (71) 
Hence, the norm of A has the form 

S (A)= w[n] = (S(A),-H(A)), (72) 

where 5(A) = 1 is the norm of the probabilistic automaton A 
and is its entropy. As shown in J5), the entropy semiring 

K, is closed for A if A is probabilistic and the norm can 
efficiently be computed using the Floyd- Warshall algorithm 
yielding the solution of the problem in a similar way to the 
EMP. 



[19] W. Kuich and A. Salomaa, Semirings, Automata, Languages, Number 
5 in EATCS Monographs on Theoretical Computer Science, Springer- 
Verlag, Berlin, Germany, 1986. 

[20] J. Eisner, "Parameter estimation for probabilistic finite-state transduc- 
ers," in Proceedings of the 40th Annual Meeting of the Association for 
Computational Linguistics, Philadelphia, July 2002, pp. 18. 

[21] O. Ronen, J. R. Rohlicek, and M. Ostendorf, "Parameter estimation 
of dependence tree models using the EM algorithm," IEEE Signal 
Processing Lett., Aug. 1995. 

[22] E. Weinstein, M. Feder, and A. V. Oppenheim, "Sequential algorithms 
for parameter estimation based on the Kullback-Leibler information 
measure," IEEE Transactions on Acoustics, Speech, and Signal Pro- 
cessing, vol. 38, pp. 1652-1654, Sept. 1990. 

[23] V. Krishnamurfhy and J. B. Moore, "On-line estimation of hidden 
Markov model parameters based on the Kullback-Leibler information 
measure," IEEE Trans. Signal Processing, vol. 41, pp. 2557-2573. Aug. 
1993. 

[24] G. Mann and A. McCullum, "Efficient computation of entropy gradient 
for semi-supervised conditional random fields," in Human Language 
Technologies, 2007. 



References 

[1] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, "Factor graphs and 
the sum-product algorithm," IEEE Trans. Inform. Theory, vol. 47, pp. 
498-519, 2001. 

[2] H.-A. Loeliger, "An introduction to factor graphs," IEEE Signal Proc. 

Mag., pp. 28-41, Jan. 2004. 
[3] C. M. Bishop, Pattern Recognition and Machine Learning. New York, 

Springer, 2006. 

[4] S. M. Aji and R. J. McEliece, "The generalized distributive law," IEEE 
Trans. Inform. Theory, vol. 46, pp. 325-343, Mar. 2000. 

[5] C. Cortes, M. Mohri, A. Rastogi and M. Riley, "On the Computation of 
the Relative Entropy of Probabilistic Automata," International Journal 
of Foundations of Computer Science, vol. 19, pp. 219-242, 2007. 

[6] D. Hernando, V. Crespi and G. Cybenko, "Efficient Computation of 
the Hidden Markov Model Entropy for a Given Observation Sequence," 
IEEE Trans. Inform. Theory, vol. 51, pp. 2681-2687, July 2005. 

[7] A. W. Eckford and S. Pasupathy, "Iterative multiuser detection with 
graphical modeling," in Proc. IEEE International Conference on Per- 
sonal Wireless Communications, Hyderabad, India, pp. 454-458, 2000. 

[8] A. W. Eckford, "Channel estimation in block fading channels using 
the factor graph EM algorithm," in Proc. 22nd Biennial Symposium on 
Communications, Kingston, ON, Canada, pp. 44-46, 2004. 

[9] J. Dauwels, S. Korl, and H.-A. Loeliger, "Expectation maximization as 
message passing," in Proc. 2005 IEEE Int. Symp. Information Theory, 
Adelaide, Australia, Sep. 4-9, 2005, pp. 583-586. 
[10] J. Dauwels, A. Eckford, S. Korl, and H.-A. Loeliger, "Expectation 
maximization as message passing - Part I: Principles and Gaussian 
Messages," arXiv:0910.2832 Oct. 2009. 
[11] H.-A. Loeliger, J. Dauwels, Junli Hu, S. Korl, Li Ping, and F. R. Kschis- 
chang, "The factor graph approach to model-based signal processing," 
Proceedings of the IEEE, vol. 95, no. 6, pp. 1295-1322, June 2007. 
[12] J. Dauwels, S. Korl, and H.-A. Loeliger, "Steepest descent on factor 
graphs," in Proc. IEEE Information Theory Workshop, Rotorua, New 
Zealand, Aug. Sep 28, 2005, pp. 42-46. 
[13] H.-A. Loeliger, "Some remarks on factor graphs," in Proc. 3rd Int. Symp. 
on Turbo Codes and Related Topics, Sept. 15, 2003, Brest, France, pp. 
111-115. 

[14] D. J. C. MacKay, Information Theory, Inference and Learning Algo- 
rithms. Cambridge University Press, 2003. 

[15] J. S. Yedidia, W. T. Freeman, and Y. Weiss, "Constructing Free Energy 
Approximations and Generalized Belief Propagation Algorithms," IEEE 
Trans. Inform. Theory, vol. 51, pp. 2282-2312, July 2005. 

[16] K.P. Murphy, Y. Weiss, and M.I. Jordan, "Loopy belief propagation for 
approximate inference: an empirical study," in Proc. Uncertainty in Al, 
1999. 

[17] Y. Weiss, "Correctness of local probability propagation in graphical 
models with loops," Neural Computation, 12, pp. 141, 2000. 

[18] N. Wiberg, "Codes and decoding on general graphs," Ph.D. dissertation, 
Linkoping Univ., Linkoping, Sweden, 1996. 



